Debug Hadoop 3.2 profile #24

HyukjinKwon · 2020-10-06T02:22:13Z

No description provided.

…ries if AQE is enabled ### What changes were proposed in this pull request? This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries. ```scala val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("FORMATTED") == Physical Plan == AdaptiveSparkPlan (3) +- Project (2) +- Scan OneRowRelation (1) (1) Scan OneRowRelation Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project Output [1]: [Subquery subquery#3, [id=#20] AS scalarsubquery()#5L] Input: [] (3) AdaptiveSparkPlan Output [1]: [scalarsubquery()#5L] Arguments: isFinalPlan=false ``` After this change, the plan for the subquerie is shown. ```scala == Physical Plan == * Project (2) +- * Scan OneRowRelation (1) (1) Scan OneRowRelation [codegen id : 1] Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project [codegen id : 1] Output [1]: [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] Input: [] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#3, [id=#24] * HashAggregate (6) +- Exchange (5) +- * HashAggregate (4) +- * Range (3) (3) Range [codegen id : 1] Output [1]: [id#0L] Arguments: Range (1, 100, step=1, splits=Some(12)) (4) HashAggregate [codegen id : 1] Input [1]: [id#0L] Keys: [] Functions [1]: [partial_min(id#0L)] Aggregate Attributes [1]: [min#7L] Results [1]: [min#8L] (5) Exchange Input [1]: [min#8L] Arguments: SinglePartition, ENSURE_REQUIREMENTS, [id=#20] (6) HashAggregate [codegen id : 2] Input [1]: [min#8L] Keys: [] Functions [1]: [min(id#0L)] Aggregate Attributes [1]: [min(id#0L)#4L] Results [1]: [min(id#0L)#4L AS v#2L] ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the formatted plan for subqueries. ### How was this patch tested? New test. Closes apache#30855 from sarutak/fix-aqe-explain. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…subquery code ### What changes were proposed in this pull request? This PR fixes an issue that `EXPLAIN CODEGEN` and `BenchmarkQueryTest` don't show the corresponding code for subqueries. The following example is about `EXPLAIN CODEGEN`. ``` spark.conf.set("spark.sql.adaptive.enabled", "false") val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("CODEGEN") scala> spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("CODEGEN") Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 (maxMethodCodeSize:55; maxConstantPoolSize:97(0.15% used); numInnerClasses:0) == *(1) Project [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] : +- Subquery scalar-subquery#3, [id=#24] : +- *(2) HashAggregate(keys=[], functions=[min(id#0L)], output=[v#2L]) : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#20] : +- *(1) HashAggregate(keys=[], functions=[partial_min(id#0L)], output=[min#8L]) : +- *(1) Range (1, 100, step=1, splits=12) +- *(1) Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage1(references); /* 003 */ } /* 004 */ /* 005 */ // codegenStageId=1 /* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ private Object[] references; /* 008 */ private scala.collection.Iterator[] inputs; /* 009 */ private scala.collection.Iterator rdd_input_0; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] project_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1]; /* 011 */ /* 012 */ public GeneratedIteratorForCodegenStage1(Object[] references) { /* 013 */ this.references = references; /* 014 */ } /* 015 */ /* 016 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 017 */ partitionIndex = index; /* 018 */ this.inputs = inputs; /* 019 */ rdd_input_0 = inputs[0]; /* 020 */ project_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 021 */ /* 022 */ } /* 023 */ /* 024 */ private void project_doConsume_0() throws java.io.IOException { /* 025 */ // common sub-expressions /* 026 */ /* 027 */ project_mutableStateArray_0[0].reset(); /* 028 */ /* 029 */ if (false) { /* 030 */ project_mutableStateArray_0[0].setNullAt(0); /* 031 */ } else { /* 032 */ project_mutableStateArray_0[0].write(0, 1L); /* 033 */ } /* 034 */ append((project_mutableStateArray_0[0].getRow())); /* 035 */ /* 036 */ } /* 037 */ /* 038 */ protected void processNext() throws java.io.IOException { /* 039 */ while ( rdd_input_0.hasNext()) { /* 040 */ InternalRow rdd_row_0 = (InternalRow) rdd_input_0.next(); /* 041 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1); /* 042 */ project_doConsume_0(); /* 043 */ if (shouldStop()) return; /* 044 */ } /* 045 */ } /* 046 */ /* 047 */ } ``` After this change, the corresponding code for subqueries are shown. ``` Found 3 WholeStageCodegen subtrees. == Subtree 1 / 3 (maxMethodCodeSize:282; maxConstantPoolSize:206(0.31% used); numInnerClasses:0) == *(1) HashAggregate(keys=[], functions=[partial_min(id#0L)], output=[min#8L]) +- *(1) Range (1, 100, step=1, splits=12) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage1(references); /* 003 */ } /* 004 */ /* 005 */ // codegenStageId=1 /* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ private Object[] references; /* 008 */ private scala.collection.Iterator[] inputs; /* 009 */ private boolean agg_initAgg_0; /* 010 */ private boolean agg_bufIsNull_0; /* 011 */ private long agg_bufValue_0; /* 012 */ private boolean range_initRange_0; /* 013 */ private long range_nextIndex_0; /* 014 */ private TaskContext range_taskContext_0; /* 015 */ private InputMetrics range_inputMetrics_0; /* 016 */ private long range_batchEnd_0; /* 017 */ private long range_numElementsTodo_0; /* 018 */ private boolean agg_agg_isNull_2_0; /* 019 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] range_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[3]; /* 020 */ /* 021 */ public GeneratedIteratorForCodegenStage1(Object[] references) { /* 022 */ this.references = references; /* 023 */ } /* 024 */ /* 025 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 026 */ partitionIndex = index; /* 027 */ this.inputs = inputs; /* 028 */ /* 029 */ range_taskContext_0 = TaskContext.get(); /* 030 */ range_inputMetrics_0 = range_taskContext_0.taskMetrics().inputMetrics(); /* 031 */ range_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 032 */ range_mutableStateArray_0[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 033 */ range_mutableStateArray_0[2] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 034 */ /* 035 */ } /* 036 */ /* 037 */ private void agg_doAggregateWithoutKey_0() throws java.io.IOException { /* 038 */ // initialize aggregation buffer /* 039 */ agg_bufIsNull_0 = true; /* 040 */ agg_bufValue_0 = -1L; /* 041 */ /* 042 */ // initialize Range /* 043 */ if (!range_initRange_0) { /* 044 */ range_initRange_0 = true; /* 045 */ initRange(partitionIndex); /* 046 */ } /* 047 */ /* 048 */ while (true) { /* 049 */ if (range_nextIndex_0 == range_batchEnd_0) { /* 050 */ long range_nextBatchTodo_0; /* 051 */ if (range_numElementsTodo_0 > 1000L) { /* 052 */ range_nextBatchTodo_0 = 1000L; /* 053 */ range_numElementsTodo_0 -= 1000L; /* 054 */ } else { /* 055 */ range_nextBatchTodo_0 = range_numElementsTodo_0; /* 056 */ range_numElementsTodo_0 = 0; /* 057 */ if (range_nextBatchTodo_0 == 0) break; /* 058 */ } /* 059 */ range_batchEnd_0 += range_nextBatchTodo_0 * 1L; /* 060 */ } /* 061 */ /* 062 */ int range_localEnd_0 = (int)((range_batchEnd_0 - range_nextIndex_0) / 1L); /* 063 */ for (int range_localIdx_0 = 0; range_localIdx_0 < range_localEnd_0; range_localIdx_0++) { /* 064 */ long range_value_0 = ((long)range_localIdx_0 * 1L) + range_nextIndex_0; /* 065 */ /* 066 */ agg_doConsume_0(range_value_0); /* 067 */ /* 068 */ // shouldStop check is eliminated /* 069 */ } /* 070 */ range_nextIndex_0 = range_batchEnd_0; /* 071 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(range_localEnd_0); /* 072 */ range_inputMetrics_0.incRecordsRead(range_localEnd_0); /* 073 */ range_taskContext_0.killTaskIfInterrupted(); /* 074 */ } /* 075 */ /* 076 */ } /* 077 */ /* 078 */ private void initRange(int idx) { /* 079 */ java.math.BigInteger index = java.math.BigInteger.valueOf(idx); /* 080 */ java.math.BigInteger numSlice = java.math.BigInteger.valueOf(12L); /* 081 */ java.math.BigInteger numElement = java.math.BigInteger.valueOf(99L); /* 082 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L); /* 083 */ java.math.BigInteger start = java.math.BigInteger.valueOf(1L); /* 084 */ long partitionEnd; /* 085 */ /* 086 */ java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); /* 087 */ if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 088 */ range_nextIndex_0 = Long.MAX_VALUE; /* 089 */ } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 090 */ range_nextIndex_0 = Long.MIN_VALUE; /* 091 */ } else { /* 092 */ range_nextIndex_0 = st.longValue(); /* 093 */ } /* 094 */ range_batchEnd_0 = range_nextIndex_0; /* 095 */ /* 096 */ java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice) /* 097 */ .multiply(step).add(start); /* 098 */ if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 099 */ partitionEnd = Long.MAX_VALUE; /* 100 */ } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 101 */ partitionEnd = Long.MIN_VALUE; /* 102 */ } else { /* 103 */ partitionEnd = end.longValue(); /* 104 */ } /* 105 */ /* 106 */ java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract( /* 107 */ java.math.BigInteger.valueOf(range_nextIndex_0)); /* 108 */ range_numElementsTodo_0 = startToEnd.divide(step).longValue(); /* 109 */ if (range_numElementsTodo_0 < 0) { /* 110 */ range_numElementsTodo_0 = 0; /* 111 */ } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) { /* 112 */ range_numElementsTodo_0++; /* 113 */ } /* 114 */ } /* 115 */ /* 116 */ private void agg_doConsume_0(long agg_expr_0_0) throws java.io.IOException { /* 117 */ // do aggregate /* 118 */ // common sub-expressions /* 119 */ /* 120 */ // evaluate aggregate functions and update aggregation buffers /* 121 */ /* 122 */ agg_agg_isNull_2_0 = true; /* 123 */ long agg_value_2 = -1L; /* 124 */ /* 125 */ if (!agg_bufIsNull_0 && (agg_agg_isNull_2_0 || /* 126 */ agg_value_2 > agg_bufValue_0)) { /* 127 */ agg_agg_isNull_2_0 = false; /* 128 */ agg_value_2 = agg_bufValue_0; /* 129 */ } /* 130 */ /* 131 */ if (!false && (agg_agg_isNull_2_0 || /* 132 */ agg_value_2 > agg_expr_0_0)) { /* 133 */ agg_agg_isNull_2_0 = false; /* 134 */ agg_value_2 = agg_expr_0_0; /* 135 */ } /* 136 */ /* 137 */ agg_bufIsNull_0 = agg_agg_isNull_2_0; /* 138 */ agg_bufValue_0 = agg_value_2; /* 139 */ /* 140 */ } /* 141 */ /* 142 */ protected void processNext() throws java.io.IOException { /* 143 */ while (!agg_initAgg_0) { /* 144 */ agg_initAgg_0 = true; /* 145 */ long agg_beforeAgg_0 = System.nanoTime(); /* 146 */ agg_doAggregateWithoutKey_0(); /* 147 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[2] /* aggTime */).add((System.nanoTime() - agg_beforeAgg_0) / 1000000); /* 148 */ /* 149 */ // output the result /* 150 */ /* 151 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* numOutputRows */).add(1); /* 152 */ range_mutableStateArray_0[2].reset(); /* 153 */ /* 154 */ range_mutableStateArray_0[2].zeroOutNullBytes(); /* 155 */ /* 156 */ if (agg_bufIsNull_0) { /* 157 */ range_mutableStateArray_0[2].setNullAt(0); /* 158 */ } else { /* 159 */ range_mutableStateArray_0[2].write(0, agg_bufValue_0); /* 160 */ } /* 161 */ append((range_mutableStateArray_0[2].getRow())); /* 162 */ } /* 163 */ } /* 164 */ /* 165 */ } ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. After this change, users can see subquery code by `EXPLAIN CODEGEN`. ### How was this patch tested? New test. Closes apache#30859 from sarutak/explain-codegen-subqueries. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

…ries if AQE is enabled ### What changes were proposed in this pull request? This PR fixes an issue that when AQE is enabled, EXPLAIN FORMATTED doesn't show the plan for subqueries. ```scala val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("FORMATTED") == Physical Plan == AdaptiveSparkPlan (3) +- Project (2) +- Scan OneRowRelation (1) (1) Scan OneRowRelation Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project Output [1]: [Subquery subquery#3, [id=#20] AS scalarsubquery()#5L] Input: [] (3) AdaptiveSparkPlan Output [1]: [scalarsubquery()#5L] Arguments: isFinalPlan=false ``` After this change, the plan for the subquerie is shown. ```scala == Physical Plan == * Project (2) +- * Scan OneRowRelation (1) (1) Scan OneRowRelation [codegen id : 1] Output: [] Arguments: ParallelCollectionRDD[0] at explain at <console>:24, OneRowRelation, UnknownPartitioning(0) (2) Project [codegen id : 1] Output [1]: [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] Input: [] ===== Subqueries ===== Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery scalar-subquery#3, [id=#24] * HashAggregate (6) +- Exchange (5) +- * HashAggregate (4) +- * Range (3) (3) Range [codegen id : 1] Output [1]: [id#0L] Arguments: Range (1, 100, step=1, splits=Some(12)) (4) HashAggregate [codegen id : 1] Input [1]: [id#0L] Keys: [] Functions [1]: [partial_min(id#0L)] Aggregate Attributes [1]: [min#7L] Results [1]: [min#8L] (5) Exchange Input [1]: [min#8L] Arguments: SinglePartition, ENSURE_REQUIREMENTS, [id=#20] (6) HashAggregate [codegen id : 2] Input [1]: [min#8L] Keys: [] Functions [1]: [min(id#0L)] Aggregate Attributes [1]: [min(id#0L)#4L] Results [1]: [min(id#0L)#4L AS v#2L] ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. Users can see the formatted plan for subqueries. ### How was this patch tested? New test. Closes apache#30855 from sarutak/fix-aqe-explain. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 70da86a) Signed-off-by: Dongjoon Hyun <[email protected]>

…subquery code ### What changes were proposed in this pull request? This PR fixes an issue that `EXPLAIN CODEGEN` and `BenchmarkQueryTest` don't show the corresponding code for subqueries. The following example is about `EXPLAIN CODEGEN`. ``` spark.conf.set("spark.sql.adaptive.enabled", "false") val df = spark.range(1, 100) df.createTempView("df") spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("CODEGEN") scala> spark.sql("SELECT (SELECT min(id) AS v FROM df)").explain("CODEGEN") Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 (maxMethodCodeSize:55; maxConstantPoolSize:97(0.15% used); numInnerClasses:0) == *(1) Project [Subquery scalar-subquery#3, [id=#24] AS scalarsubquery()#5L] : +- Subquery scalar-subquery#3, [id=#24] : +- *(2) HashAggregate(keys=[], functions=[min(id#0L)], output=[v#2L]) : +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [id=#20] : +- *(1) HashAggregate(keys=[], functions=[partial_min(id#0L)], output=[min#8L]) : +- *(1) Range (1, 100, step=1, splits=12) +- *(1) Scan OneRowRelation[] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage1(references); /* 003 */ } /* 004 */ /* 005 */ // codegenStageId=1 /* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ private Object[] references; /* 008 */ private scala.collection.Iterator[] inputs; /* 009 */ private scala.collection.Iterator rdd_input_0; /* 010 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] project_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[1]; /* 011 */ /* 012 */ public GeneratedIteratorForCodegenStage1(Object[] references) { /* 013 */ this.references = references; /* 014 */ } /* 015 */ /* 016 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 017 */ partitionIndex = index; /* 018 */ this.inputs = inputs; /* 019 */ rdd_input_0 = inputs[0]; /* 020 */ project_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 021 */ /* 022 */ } /* 023 */ /* 024 */ private void project_doConsume_0() throws java.io.IOException { /* 025 */ // common sub-expressions /* 026 */ /* 027 */ project_mutableStateArray_0[0].reset(); /* 028 */ /* 029 */ if (false) { /* 030 */ project_mutableStateArray_0[0].setNullAt(0); /* 031 */ } else { /* 032 */ project_mutableStateArray_0[0].write(0, 1L); /* 033 */ } /* 034 */ append((project_mutableStateArray_0[0].getRow())); /* 035 */ /* 036 */ } /* 037 */ /* 038 */ protected void processNext() throws java.io.IOException { /* 039 */ while ( rdd_input_0.hasNext()) { /* 040 */ InternalRow rdd_row_0 = (InternalRow) rdd_input_0.next(); /* 041 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(1); /* 042 */ project_doConsume_0(); /* 043 */ if (shouldStop()) return; /* 044 */ } /* 045 */ } /* 046 */ /* 047 */ } ``` After this change, the corresponding code for subqueries are shown. ``` Found 3 WholeStageCodegen subtrees. == Subtree 1 / 3 (maxMethodCodeSize:282; maxConstantPoolSize:206(0.31% used); numInnerClasses:0) == *(1) HashAggregate(keys=[], functions=[partial_min(id#0L)], output=[min#8L]) +- *(1) Range (1, 100, step=1, splits=12) Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIteratorForCodegenStage1(references); /* 003 */ } /* 004 */ /* 005 */ // codegenStageId=1 /* 006 */ final class GeneratedIteratorForCodegenStage1 extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ private Object[] references; /* 008 */ private scala.collection.Iterator[] inputs; /* 009 */ private boolean agg_initAgg_0; /* 010 */ private boolean agg_bufIsNull_0; /* 011 */ private long agg_bufValue_0; /* 012 */ private boolean range_initRange_0; /* 013 */ private long range_nextIndex_0; /* 014 */ private TaskContext range_taskContext_0; /* 015 */ private InputMetrics range_inputMetrics_0; /* 016 */ private long range_batchEnd_0; /* 017 */ private long range_numElementsTodo_0; /* 018 */ private boolean agg_agg_isNull_2_0; /* 019 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] range_mutableStateArray_0 = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[3]; /* 020 */ /* 021 */ public GeneratedIteratorForCodegenStage1(Object[] references) { /* 022 */ this.references = references; /* 023 */ } /* 024 */ /* 025 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 026 */ partitionIndex = index; /* 027 */ this.inputs = inputs; /* 028 */ /* 029 */ range_taskContext_0 = TaskContext.get(); /* 030 */ range_inputMetrics_0 = range_taskContext_0.taskMetrics().inputMetrics(); /* 031 */ range_mutableStateArray_0[0] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 032 */ range_mutableStateArray_0[1] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 033 */ range_mutableStateArray_0[2] = new org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(1, 0); /* 034 */ /* 035 */ } /* 036 */ /* 037 */ private void agg_doAggregateWithoutKey_0() throws java.io.IOException { /* 038 */ // initialize aggregation buffer /* 039 */ agg_bufIsNull_0 = true; /* 040 */ agg_bufValue_0 = -1L; /* 041 */ /* 042 */ // initialize Range /* 043 */ if (!range_initRange_0) { /* 044 */ range_initRange_0 = true; /* 045 */ initRange(partitionIndex); /* 046 */ } /* 047 */ /* 048 */ while (true) { /* 049 */ if (range_nextIndex_0 == range_batchEnd_0) { /* 050 */ long range_nextBatchTodo_0; /* 051 */ if (range_numElementsTodo_0 > 1000L) { /* 052 */ range_nextBatchTodo_0 = 1000L; /* 053 */ range_numElementsTodo_0 -= 1000L; /* 054 */ } else { /* 055 */ range_nextBatchTodo_0 = range_numElementsTodo_0; /* 056 */ range_numElementsTodo_0 = 0; /* 057 */ if (range_nextBatchTodo_0 == 0) break; /* 058 */ } /* 059 */ range_batchEnd_0 += range_nextBatchTodo_0 * 1L; /* 060 */ } /* 061 */ /* 062 */ int range_localEnd_0 = (int)((range_batchEnd_0 - range_nextIndex_0) / 1L); /* 063 */ for (int range_localIdx_0 = 0; range_localIdx_0 < range_localEnd_0; range_localIdx_0++) { /* 064 */ long range_value_0 = ((long)range_localIdx_0 * 1L) + range_nextIndex_0; /* 065 */ /* 066 */ agg_doConsume_0(range_value_0); /* 067 */ /* 068 */ // shouldStop check is eliminated /* 069 */ } /* 070 */ range_nextIndex_0 = range_batchEnd_0; /* 071 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[0] /* numOutputRows */).add(range_localEnd_0); /* 072 */ range_inputMetrics_0.incRecordsRead(range_localEnd_0); /* 073 */ range_taskContext_0.killTaskIfInterrupted(); /* 074 */ } /* 075 */ /* 076 */ } /* 077 */ /* 078 */ private void initRange(int idx) { /* 079 */ java.math.BigInteger index = java.math.BigInteger.valueOf(idx); /* 080 */ java.math.BigInteger numSlice = java.math.BigInteger.valueOf(12L); /* 081 */ java.math.BigInteger numElement = java.math.BigInteger.valueOf(99L); /* 082 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L); /* 083 */ java.math.BigInteger start = java.math.BigInteger.valueOf(1L); /* 084 */ long partitionEnd; /* 085 */ /* 086 */ java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); /* 087 */ if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 088 */ range_nextIndex_0 = Long.MAX_VALUE; /* 089 */ } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 090 */ range_nextIndex_0 = Long.MIN_VALUE; /* 091 */ } else { /* 092 */ range_nextIndex_0 = st.longValue(); /* 093 */ } /* 094 */ range_batchEnd_0 = range_nextIndex_0; /* 095 */ /* 096 */ java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice) /* 097 */ .multiply(step).add(start); /* 098 */ if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 099 */ partitionEnd = Long.MAX_VALUE; /* 100 */ } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 101 */ partitionEnd = Long.MIN_VALUE; /* 102 */ } else { /* 103 */ partitionEnd = end.longValue(); /* 104 */ } /* 105 */ /* 106 */ java.math.BigInteger startToEnd = java.math.BigInteger.valueOf(partitionEnd).subtract( /* 107 */ java.math.BigInteger.valueOf(range_nextIndex_0)); /* 108 */ range_numElementsTodo_0 = startToEnd.divide(step).longValue(); /* 109 */ if (range_numElementsTodo_0 < 0) { /* 110 */ range_numElementsTodo_0 = 0; /* 111 */ } else if (startToEnd.remainder(step).compareTo(java.math.BigInteger.valueOf(0L)) != 0) { /* 112 */ range_numElementsTodo_0++; /* 113 */ } /* 114 */ } /* 115 */ /* 116 */ private void agg_doConsume_0(long agg_expr_0_0) throws java.io.IOException { /* 117 */ // do aggregate /* 118 */ // common sub-expressions /* 119 */ /* 120 */ // evaluate aggregate functions and update aggregation buffers /* 121 */ /* 122 */ agg_agg_isNull_2_0 = true; /* 123 */ long agg_value_2 = -1L; /* 124 */ /* 125 */ if (!agg_bufIsNull_0 && (agg_agg_isNull_2_0 || /* 126 */ agg_value_2 > agg_bufValue_0)) { /* 127 */ agg_agg_isNull_2_0 = false; /* 128 */ agg_value_2 = agg_bufValue_0; /* 129 */ } /* 130 */ /* 131 */ if (!false && (agg_agg_isNull_2_0 || /* 132 */ agg_value_2 > agg_expr_0_0)) { /* 133 */ agg_agg_isNull_2_0 = false; /* 134 */ agg_value_2 = agg_expr_0_0; /* 135 */ } /* 136 */ /* 137 */ agg_bufIsNull_0 = agg_agg_isNull_2_0; /* 138 */ agg_bufValue_0 = agg_value_2; /* 139 */ /* 140 */ } /* 141 */ /* 142 */ protected void processNext() throws java.io.IOException { /* 143 */ while (!agg_initAgg_0) { /* 144 */ agg_initAgg_0 = true; /* 145 */ long agg_beforeAgg_0 = System.nanoTime(); /* 146 */ agg_doAggregateWithoutKey_0(); /* 147 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[2] /* aggTime */).add((System.nanoTime() - agg_beforeAgg_0) / 1000000); /* 148 */ /* 149 */ // output the result /* 150 */ /* 151 */ ((org.apache.spark.sql.execution.metric.SQLMetric) references[1] /* numOutputRows */).add(1); /* 152 */ range_mutableStateArray_0[2].reset(); /* 153 */ /* 154 */ range_mutableStateArray_0[2].zeroOutNullBytes(); /* 155 */ /* 156 */ if (agg_bufIsNull_0) { /* 157 */ range_mutableStateArray_0[2].setNullAt(0); /* 158 */ } else { /* 159 */ range_mutableStateArray_0[2].write(0, agg_bufValue_0); /* 160 */ } /* 161 */ append((range_mutableStateArray_0[2].getRow())); /* 162 */ } /* 163 */ } /* 164 */ /* 165 */ } ``` ### Why are the changes needed? For better debuggability. ### Does this PR introduce _any_ user-facing change? Yes. After this change, users can see subquery code by `EXPLAIN CODEGEN`. ### How was this patch tested? New test. Closes apache#30859 from sarutak/explain-codegen-subqueries. Authored-by: Kousuke Saruta <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit f4e1069) Signed-off-by: Dongjoon Hyun <[email protected]>

This is a trivial change to replace the loop index from `int` to `long`. Surprisingly, microbenchmark shows more than double performance uplift. Analysis -------- The hot loop of `arrayEquals` method is simplifed as below. Loop index `i` is defined as `int`, it's compared with `length`, which is a `long`, to determine if the loop should end. ``` public static boolean arrayEquals( Object leftBase, long leftOffset, Object rightBase, long rightOffset, final long length) { ...... int i = 0; while (i <= length - 8) { if (Platform.getLong(leftBase, leftOffset + i) != Platform.getLong(rightBase, rightOffset + i)) { return false; } i += 8; } ...... } ``` Strictly speaking, there's a code bug here. If `length` is greater than 2^31 + 8, this loop will never end because `i` as a 32 bit integer is at most 2^31 - 1. But compiler must consider this behaviour as intentional and generate code strictly match the logic. It prevents compiler from generating optimal code. Defining loop index `i` as `long` corrects this issue. Besides more accurate code logic, JIT is able to optimize this code much more aggressively. From microbenchmark, this trivial change improves performance significantly on both Arm and x86 platforms. Benchmark --------- Source code: https://gist.github.com/cyb70289/258e261f388e22f47e4d961431786d1a Result on Arm Neoverse N2: ``` Benchmark Mode Cnt Score Error Units ArrayEqualsBenchmark.arrayEqualsInt avgt 10 674.313 ± 0.213 ns/op ArrayEqualsBenchmark.arrayEqualsLong avgt 10 313.563 ± 2.338 ns/op ``` Result on Intel Cascake Lake: ``` Benchmark Mode Cnt Score Error Units ArrayEqualsBenchmark.arrayEqualsInt avgt 10 1130.695 ± 0.168 ns/op ArrayEqualsBenchmark.arrayEqualsLong avgt 10 461.979 ± 0.097 ns/op ``` Deep dive --------- Dive deep to the machine code level, we can see why the big gap. Listed below are arm64 assembly generated by Openjdk-17 C2 compiler. For `int i`, the machine code is similar to source code, no deep optimization. Safepoint polling is expensive in this short loop. ``` // jit c2 machine code snippet 0x0000ffff81ba8904: mov w15, wzr // int i = 0 0x0000ffff81ba8908: nop 0x0000ffff81ba890c: nop loop: 0x0000ffff81ba8910: ldr x10, [x13, w15, sxtw] // Platform.getLong(leftBase, leftOffset + i) 0x0000ffff81ba8914: ldr x14, [x12, w15, sxtw] // Platform.getLong(rightBase, rightOffset + i) 0x0000ffff81ba8918: cmp x10, x14 0x0000ffff81ba891c: b.ne 0x0000ffff81ba899c // return false if not equal 0x0000ffff81ba8920: ldr x14, [x28, apache#848] // x14 -> safepoint 0x0000ffff81ba8924: add w15, w15, #0x8 // i += 8 0x0000ffff81ba8928: ldr wzr, [x14] // safepoint polling 0x0000ffff81ba892c: sxtw x10, w15 // extend i to long 0x0000ffff81ba8930: cmp x10, x11 0x0000ffff81ba8934: b.le 0x0000ffff81ba8910 // if (i <= length - 8) goto loop ``` For `long i`, JIT is able to do much more aggressive optimization. E.g, below code snippet unrolls the loop by four. ``` // jit c2 machine code snippet unrolled_loop: 0x0000ffff91de6fe0: sxtw x10, w7 0x0000ffff91de6fe4: add x23, x22, x10 0x0000ffff91de6fe8: add x24, x21, x10 0x0000ffff91de6fec: ldr x13, [x23] // unroll-1 0x0000ffff91de6ff0: ldr x14, [x24] 0x0000ffff91de6ff4: cmp x13, x14 0x0000ffff91de6ff8: b.ne 0x0000ffff91de70a8 0x0000ffff91de6ffc: ldr x13, [x23, #8] // unroll-2 0x0000ffff91de7000: ldr x14, [x24, #8] 0x0000ffff91de7004: cmp x13, x14 0x0000ffff91de7008: b.ne 0x0000ffff91de70b4 0x0000ffff91de700c: ldr x13, [x23, #16] // unroll-3 0x0000ffff91de7010: ldr x14, [x24, #16] 0x0000ffff91de7014: cmp x13, x14 0x0000ffff91de7018: b.ne 0x0000ffff91de70a4 0x0000ffff91de701c: ldr x13, [x23, #24] // unroll-4 0x0000ffff91de7020: ldr x14, [x24, #24] 0x0000ffff91de7024: cmp x13, x14 0x0000ffff91de7028: b.ne 0x0000ffff91de70b0 0x0000ffff91de702c: add w7, w7, #0x20 0x0000ffff91de7030: cmp w7, w11 0x0000ffff91de7034: b.lt 0x0000ffff91de6fe0 ``` ### What changes were proposed in this pull request? A trivial change to replace loop index `i` of method `arrayEquals` from `int` to `long`. ### Why are the changes needed? To improve performance and fix a possible bug. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing unit tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#49568 from cyb70289/arrayEquals. Authored-by: Yibo Cai <[email protected]> Signed-off-by: Sean Owen <[email protected]>

sunchao and others added 8 commits October 6, 2020 11:20

[WIP][SPARK-29250] Upgrade to Hadoop 3.2.1

749055c

Switch to Python3 for YARN tests & fix a bug

855bf70

Fix more tests related to Python3

77dc194

Remove more dependencies

50a031f

Add Maven profiles to handle Hadoop 2.7

4c3067f

Fix default profile

6393293

Require explicit Hadoop profile

8b9f381

Debug Hadoop 3.2 profile

59d5242

HyukjinKwon force-pushed the SPARK-29250-debug branch from 28e54e4 to 59d5242 Compare October 6, 2020 02:27

Run the test directly

d08c8ff

HyukjinKwon force-pushed the SPARK-29250-debug branch from 99234b7 to d08c8ff Compare October 6, 2020 05:28

HyukjinKwon added 8 commits October 6, 2020 14:41

which python

8aba753

show path in driver

0d383dd

send paths

e70c62a

test

1cd5df5

Check PATH

d40391d

Try to set PATH in both driver and executor

bf4340f

show path

f7d646f

minimised fix

51ede03

HyukjinKwon force-pushed the SPARK-29250-debug branch from ca303a3 to 51ede03 Compare October 6, 2020 06:43

HyukjinKwon closed this Oct 6, 2020

HyukjinKwon deleted the SPARK-29250-debug branch December 7, 2020 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug Hadoop 3.2 profile #24

Debug Hadoop 3.2 profile #24

HyukjinKwon commented Oct 6, 2020

Debug Hadoop 3.2 profile #24

Debug Hadoop 3.2 profile #24

Conversation

HyukjinKwon commented Oct 6, 2020